Overview

Dataset statistics

Number of variables8
Number of observations500
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory31.4 KiB
Average record size in memory64.3 B

Variable types

NUM7
BOOL1

Reproduction

Analysis started2020-10-22 22:21:10.705092
Analysis finished2020-10-22 22:21:38.721472
Versionpandas-profiling v2.5.0
Command linepandas_profiling --config_file config.yaml [YOUR_FILE.csv]
Download configurationconfig.yaml

Variables

GRE Score
Real number (ℝ≥0)

Distinct count49
Unique (%)9.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean316.472
Minimum290
Maximum340
Zeros0
Zeros (%)0.0%
Memory size4.0 KiB

Quantile statistics

Minimum290
5-th percentile298
Q1308
median317
Q3325
95-th percentile335
Maximum340
Range50
Interquartile range (IQR)17

Descriptive statistics

Standard deviation11.29514837
Coefficient of variation (CV)0.03569083007
Kurtosis-0.7110644626
Mean316.472
Median Absolute Deviation (MAD)9.360224
Skewness-0.03984185809
Sum158236
Variance127.5803768
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[290. 294.5 310.5 327.5 340. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
312 24 4.8%
 
324 23 4.6%
 
316 18 3.6%
 
321 17 3.4%
 
322 17 3.4%
 
327 17 3.4%
 
314 16 3.2%
 
311 16 3.2%
 
320 16 3.2%
 
317 15 3.0%
 
Other values (39) 321 64.2%
 
ValueCountFrequency (%) 
290 2 0.4%
 
293 1 0.2%
 
294 2 0.4%
 
295 5 1.0%
 
296 5 1.0%
 
ValueCountFrequency (%) 
340 9 1.8%
 
339 3 0.6%
 
338 4 0.8%
 
337 2 0.4%
 
336 5 1.0%
 

TOEFL Score
Real number (ℝ≥0)

Distinct count29
Unique (%)5.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean107.192
Minimum92
Maximum120
Zeros0
Zeros (%)0.0%
Memory size4.0 KiB

Quantile statistics

Minimum92
5-th percentile98
Q1103
median107
Q3112
95-th percentile118
Maximum120
Range28
Interquartile range (IQR)9

Descriptive statistics

Standard deviation6.08186766
Coefficient of variation (CV)0.05673807429
Kurtosis-0.6532454042
Mean107.192
Median Absolute Deviation (MAD)5.054592
Skewness0.09560097236
Sum53596
Variance36.98911423
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[ 92. 95.5 98.5 113.5 120. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
110 44 8.8%
 
105 37 7.4%
 
104 29 5.8%
 
112 28 5.6%
 
107 28 5.6%
 
106 28 5.6%
 
103 25 5.0%
 
102 24 4.8%
 
100 24 4.8%
 
99 23 4.6%
 
Other values (19) 210 42.0%
 
ValueCountFrequency (%) 
92 1 0.2%
 
93 2 0.4%
 
94 2 0.4%
 
95 3 0.6%
 
96 6 1.2%
 
ValueCountFrequency (%) 
120 9 1.8%
 
119 10 2.0%
 
118 10 2.0%
 
117 8 1.6%
 
116 16 3.2%
 

University Rating
Real number (ℝ≥0)

Distinct count5
Unique (%)1.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.114
Minimum1
Maximum5
Zeros0
Zeros (%)0.0%
Memory size4.0 KiB

Quantile statistics

Minimum1
5-th percentile1
Q12
median3
Q34
95-th percentile5
Maximum5
Range4
Interquartile range (IQR)2

Descriptive statistics

Standard deviation1.143511801
Coefficient of variation (CV)0.3672163779
Kurtosis-0.8100796635
Mean3.114
Median Absolute Deviation (MAD)0.922832
Skewness0.09029498313
Sum1557
Variance1.307619238
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[1. 1.5 5. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
3 162 32.4%
 
2 126 25.2%
 
4 105 21.0%
 
5 73 14.6%
 
1 34 6.8%
 
ValueCountFrequency (%) 
1 34 6.8%
 
2 126 25.2%
 
3 162 32.4%
 
4 105 21.0%
 
5 73 14.6%
 
ValueCountFrequency (%) 
5 73 14.6%
 
4 105 21.0%
 
3 162 32.4%
 
2 126 25.2%
 
1 34 6.8%
 

SOP
Real number (ℝ≥0)

Distinct count9
Unique (%)1.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.374
Minimum1
Maximum5
Zeros0
Zeros (%)0.0%
Memory size4.0 KiB

Quantile statistics

Minimum1
5-th percentile1.5
Q12.5
median3.5
Q34
95-th percentile5
Maximum5
Range4
Interquartile range (IQR)1.5

Descriptive statistics

Standard deviation0.9910036208
Coefficient of variation (CV)0.2937177299
Kurtosis-0.7057169536
Mean3.374
Median Absolute Deviation (MAD)0.824128
Skewness-0.2289723963
Sum1687
Variance0.9820881764
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[1. 1.75 2.25 5. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
4 89 17.8%
 
3.5 88 17.6%
 
3 80 16.0%
 
2.5 64 12.8%
 
4.5 63 12.6%
 
2 43 8.6%
 
5 42 8.4%
 
1.5 25 5.0%
 
1 6 1.2%
 
ValueCountFrequency (%) 
1 6 1.2%
 
1.5 25 5.0%
 
2 43 8.6%
 
2.5 64 12.8%
 
3 80 16.0%
 
ValueCountFrequency (%) 
5 42 8.4%
 
4.5 63 12.6%
 
4 89 17.8%
 
3.5 88 17.6%
 
3 80 16.0%
 

LOR
Real number (ℝ≥0)

Distinct count9
Unique (%)1.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.484
Minimum1
Maximum5
Zeros0
Zeros (%)0.0%
Memory size4.0 KiB

Quantile statistics

Minimum1
5-th percentile2
Q13
median3.5
Q34
95-th percentile5
Maximum5
Range4
Interquartile range (IQR)1

Descriptive statistics

Standard deviation0.9254495739
Coefficient of variation (CV)0.2656284655
Kurtosis-0.7457485106
Mean3.484
Median Absolute Deviation (MAD)0.758752
Skewness-0.1452903146
Sum1742
Variance0.8564569138
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[1. 1.75 2.75 5. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
3 99 19.8%
 
4 94 18.8%
 
3.5 86 17.2%
 
4.5 63 12.6%
 
5 50 10.0%
 
2.5 50 10.0%
 
2 46 9.2%
 
1.5 11 2.2%
 
1 1 0.2%
 
ValueCountFrequency (%) 
1 1 0.2%
 
1.5 11 2.2%
 
2 46 9.2%
 
2.5 50 10.0%
 
3 99 19.8%
 
ValueCountFrequency (%) 
5 50 10.0%
 
4.5 63 12.6%
 
4 94 18.8%
 
3.5 86 17.2%
 
3 99 19.8%
 

CGPA
Real number (ℝ≥0)

Distinct count184
Unique (%)36.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean8.57644
Minimum6.8
Maximum9.92
Zeros0
Zeros (%)0.0%
Memory size4.0 KiB

Quantile statistics

Minimum6.8
5-th percentile7.638
Q18.1275
median8.56
Q39.04
95-th percentile9.6
Maximum9.92
Range3.12
Interquartile range (IQR)0.9125

Descriptive statistics

Standard deviation0.6048128003
Coefficient of variation (CV)0.07052026253
Kurtosis-0.5612783981
Mean8.57644
Median Absolute Deviation (MAD)0.4982488
Skewness-0.02661251732
Sum4288.22
Variance0.3657985234
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[6.8 7.205 7.62 7.685 7.85 9.265 9.92 ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
8 9 1.8%
 
8.76 9 1.8%
 
8.54 7 1.4%
 
8.45 7 1.4%
 
8.56 7 1.4%
 
8.12 7 1.4%
 
7.88 6 1.2%
 
8.64 6 1.2%
 
8.66 6 1.2%
 
9.11 6 1.2%
 
Other values (174) 430 86.0%
 
ValueCountFrequency (%) 
6.8 1 0.2%
 
7.2 1 0.2%
 
7.21 1 0.2%
 
7.23 1 0.2%
 
7.25 1 0.2%
 
ValueCountFrequency (%) 
9.92 1 0.2%
 
9.91 1 0.2%
 
9.87 2 0.4%
 
9.86 1 0.2%
 
9.82 1 0.2%
 

Research
Boolean

Distinct count2
Unique (%)0.4%
Missing0
Missing (%)0.0%
Memory size4.0 KiB
1
280
0
220
ValueCountFrequency (%) 
1 280 56.0%
 
0 220 44.0%
 

Chance of Admit
Real number (ℝ≥0)

Distinct count61
Unique (%)12.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.72174
Minimum0.34
Maximum0.97
Zeros0
Zeros (%)0.0%
Memory size4.0 KiB

Quantile statistics

Minimum0.34
5-th percentile0.47
Q10.63
median0.72
Q30.82
95-th percentile0.94
Maximum0.97
Range0.63
Interquartile range (IQR)0.19

Descriptive statistics

Standard deviation0.141140404
Coefficient of variation (CV)0.1955557458
Kurtosis-0.4546817998
Mean0.72174
Median Absolute Deviation (MAD)0.11391392
Skewness-0.28996621
Sum360.87
Variance0.01992061363
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[0.34 0.435 0.605 0.805 0.97 ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
0.71 23 4.6%
 
0.64 19 3.8%
 
0.73 18 3.6%
 
0.72 16 3.2%
 
0.79 16 3.2%
 
0.78 15 3.0%
 
0.76 14 2.8%
 
0.8 13 2.6%
 
0.7 13 2.6%
 
0.94 13 2.6%
 
Other values (51) 340 68.0%
 
ValueCountFrequency (%) 
0.34 2 0.4%
 
0.36 2 0.4%
 
0.37 1 0.2%
 
0.38 2 0.4%
 
0.39 1 0.2%
 
ValueCountFrequency (%) 
0.97 4 0.8%
 
0.96 8 1.6%
 
0.95 5 1.0%
 
0.94 13 2.6%
 
0.93 12 2.4%
 

Interactions

Correlations

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Missing values

Sample

First rows

GRE ScoreTOEFL ScoreUniversity RatingSOPLORCGPAResearchChance of Admit
033711844.54.59.6510.92
132410744.04.58.8710.76
231610433.03.58.0010.72
332211033.52.58.6710.80
431410322.03.08.2100.65
533011554.53.09.3410.90
632110933.04.08.2010.75
730810123.04.07.9000.68
830210212.01.58.0000.50
932310833.53.08.6000.45

Last rows

GRE ScoreTOEFL ScoreUniversity RatingSOPLORCGPAResearchChance of Admit
49030710522.54.58.1210.67
4912979943.03.57.8100.54
49229810142.54.57.6910.53
4933009523.01.58.2210.62
4943019932.52.08.4510.68
49533210854.54.09.0210.87
49633711755.05.09.8710.96
49733012054.55.09.5610.93
49831210344.05.08.4300.73
49932711344.54.59.0400.84